-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for Paeth PNG filter compression (predictor value = 4) #537
Conversation
Kind of random, but I was having issues with reading certain pdfs. Implementing your code in this patch fixed my problem completely and, as far as I can tell, I've had no errors since including it. Looking forward for this being integrated into the base package! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @edugonza,
Thanks for opening this PR and taking the time to improve pdfminer.six.
The code of the PR looks good. I did not check if the actual implementation of the type 4 predictor matches the libpng specification. But since you added the link to the spec yourself I think it's ok.
One request: can add a test case with a pdf with this type of path predictor? I know yours is proprietary but perhaps you can find an example on the internet or even generate one.
@cslotboom is it possible to share your pdf with a type 4 png path predictor? |
Hi @pietermarsman, Unfortunately, I was not able to create a PDF with a PNG encoded with this type of predictor. I do not have any open/free document to use. If you know or have documentation on how to create such a document, I could try again. Best regards |
I'm not 100% I have an example that uses this type of predictor explicitly, I just know that this code has fixed my errors. I'm using pdfminer to read lines in PDF drawings. From what I've found, any PDF with lines in it has a risk of not working without this patch. I suspect that the scale/rounding of the pdf had something to do with it, as it seemed random which pdfs worked and which didn't. |
this also fixed issues for me. thanks @edugonza! |
For reference, this is roughly the type of pdf that would fail without this patch. Can not say for certain if it has a 4 png path predictor or not. |
@cslotboom The *TestFloor.pdf` is also working using the development branch of pdfminer.six. The *2018.pdf` shared in the linked issue (I attached it here again) does indeed show that it fixes the issue. |
@edugonza Thanks for your work! Do you have some references for the changes to predictor values of 1 and 3? This changes current behavior and I'm always very careful with that, but it also looks like it is an improvement. Anyway, I would like a comment that explains what the algorithm is based on because (I think) it is not part of the PNG specification you already linked. Let me know if you have something. |
Hi @pietermarsman , If I remember well, the PNG specification in the link describes how to apply the filter to raw data. However, the piece of code at hand performs the inverse operation, to decode data from the filtered version to raw. That is why the code does not seem to apply the filter as it is, but undoes what the specification describes. I hope that explains the differences. |
pdfminer/utils.py
Outdated
for j, r in enumerate(line1): | ||
left = int(line2[j-bpp]) if j-bpp >= 0 else 0 | ||
up = int(line0[j]) | ||
c = ((r + math.floor(left + up)) // 2) & 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line must be:
c = (r + math.floor(left + up) // 2) & 255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be wrong in the current implementation as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it in the latest commit
…nd add pieces of the docs to show what is going on.
@edugonza I went over all the filter types, refactored the names for better understanding and now I'm pretty confident that everything is according to the docs. I want to check one last time before I merge. If you have time, can you validate/check/compare the current implementation against the docs as well? I would like to get this right in one go :) |
@edugonza thanks for all the work! |
Hi @pietermarsman, Sorry for my late reply. Thank you for all the fixes and corrections. |
Pull request
Fixes #339 Added support for Paeth PNG compression filter (predictor value = 4). Fixed bugs in other filters.
How Has This Been Tested?
I have tested this with a private PDF file that I am not allowed to share.
Checklist
works
version
is not necessary
verified that this is not necessary
CHANGELOG.md